::p_load(sf,spdep, tmap, tidyverse,dplyr, funModeling,rgdal, blorr,corrplot,ggpubr,GWmodel,skimr,caret) pacman
In-class Exercise 5: Modeling the Spatial Variation of the Explanatory Factors of Water Point Status using Geograpgically Weighted Logistic Regression (GWLR)
Overview
Study area: Osun State, Nigeria
Osun.rds
: LGA boundaries of Osun State
Osun_wp_sf.rds
: water points within Osun State. sf point datafram
Model Variables
Dependent Variable: Water point status (functional/non-functional)
Independent variable:
distance_to_primary_road (continuous)
distance_to_secondary_road (continuous)
distance_to_tertiary_road (continuous)
distance_to_city (continuous)
distance_to_town (continuous)
water_point_population (continuous)
local_population_1km (continuous)
usage_capacity (categorical)
is_urban (categorical)
water_source_clean (categorical)
Data Import
Importing the Analytic Data
<- read_rds("rds/Osun_wp_sf.rds")
osun_wp_sf <- read_rds("rds/Osun.rds") osun
%>%
osun_wp_sf freq(input='status')
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
ℹ The deprecated feature was likely used in the funModeling package.
Please report the issue at <https://github.com/pablo14/funModeling/issues>.
status frequency percentage cumulative_perc
1 TRUE 2642 55.5 55.5
2 FALSE 2118 44.5 100.0
The Osun state consist of 55.5% functional water point and 44.5% non functional water point.
tmap_mode("view")
tmap mode set to interactive viewing
tm_shape(osun)+
tm_polygons(alpha=0.4)+
tm_shape(osun_wp_sf)+
tm_dots(col="status",
alpha=0.6)+
tm_view(set.zoom.limits = c(9,12))
EDA
summary statistics with skim
%>%
osun_wp_sf skim()
Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined `sfl`
provided. Falling back to `character`.
Name | Piped data |
Number of rows | 4760 |
Number of columns | 75 |
_______________________ | |
Column type frequency: | |
character | 47 |
logical | 5 |
numeric | 23 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
source | 0 | 1.00 | 5 | 44 | 0 | 2 | 0 |
report_date | 0 | 1.00 | 22 | 22 | 0 | 42 | 0 |
status_id | 0 | 1.00 | 2 | 7 | 0 | 3 | 0 |
water_source_clean | 0 | 1.00 | 8 | 22 | 0 | 3 | 0 |
water_source_category | 0 | 1.00 | 4 | 6 | 0 | 2 | 0 |
water_tech_clean | 24 | 0.99 | 9 | 23 | 0 | 3 | 0 |
water_tech_category | 24 | 0.99 | 9 | 15 | 0 | 2 | 0 |
facility_type | 0 | 1.00 | 8 | 8 | 0 | 1 | 0 |
clean_country_name | 0 | 1.00 | 7 | 7 | 0 | 1 | 0 |
clean_adm1 | 0 | 1.00 | 3 | 5 | 0 | 5 | 0 |
clean_adm2 | 0 | 1.00 | 3 | 14 | 0 | 35 | 0 |
clean_adm3 | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
clean_adm4 | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
installer | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
management_clean | 1573 | 0.67 | 5 | 37 | 0 | 7 | 0 |
status_clean | 0 | 1.00 | 9 | 32 | 0 | 7 | 0 |
pay | 0 | 1.00 | 2 | 39 | 0 | 7 | 0 |
fecal_coliform_presence | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
subjective_quality | 0 | 1.00 | 18 | 20 | 0 | 4 | 0 |
activity_id | 4757 | 0.00 | 36 | 36 | 0 | 3 | 0 |
scheme_id | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
wpdx_id | 0 | 1.00 | 12 | 12 | 0 | 4760 | 0 |
notes | 0 | 1.00 | 2 | 96 | 0 | 3502 | 0 |
orig_lnk | 4757 | 0.00 | 84 | 84 | 0 | 1 | 0 |
photo_lnk | 41 | 0.99 | 84 | 84 | 0 | 4719 | 0 |
country_id | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
data_lnk | 0 | 1.00 | 79 | 96 | 0 | 2 | 0 |
water_point_history | 0 | 1.00 | 142 | 834 | 0 | 4750 | 0 |
clean_country_id | 0 | 1.00 | 3 | 3 | 0 | 1 | 0 |
country_name | 0 | 1.00 | 7 | 7 | 0 | 1 | 0 |
water_source | 0 | 1.00 | 8 | 30 | 0 | 4 | 0 |
water_tech | 0 | 1.00 | 5 | 37 | 0 | 20 | 0 |
adm2 | 0 | 1.00 | 3 | 14 | 0 | 33 | 0 |
adm3 | 4760 | 0.00 | NA | NA | 0 | 0 | 0 |
management | 1573 | 0.67 | 5 | 47 | 0 | 7 | 0 |
adm1 | 0 | 1.00 | 4 | 5 | 0 | 4 | 0 |
New Georeferenced Column | 0 | 1.00 | 16 | 35 | 0 | 4760 | 0 |
lat_lon_deg | 0 | 1.00 | 13 | 32 | 0 | 4760 | 0 |
public_data_source | 0 | 1.00 | 84 | 102 | 0 | 2 | 0 |
converted | 0 | 1.00 | 53 | 53 | 0 | 1 | 0 |
created_timestamp | 0 | 1.00 | 22 | 22 | 0 | 2 | 0 |
updated_timestamp | 0 | 1.00 | 22 | 22 | 0 | 2 | 0 |
Geometry | 0 | 1.00 | 33 | 37 | 0 | 4760 | 0 |
ADM2_EN | 0 | 1.00 | 3 | 14 | 0 | 30 | 0 |
ADM2_PCODE | 0 | 1.00 | 8 | 8 | 0 | 30 | 0 |
ADM1_EN | 0 | 1.00 | 4 | 4 | 0 | 1 | 0 |
ADM1_PCODE | 0 | 1.00 | 5 | 5 | 0 | 1 | 0 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
rehab_year | 4760 | 0 | NaN | : |
rehabilitator | 4760 | 0 | NaN | : |
is_urban | 0 | 1 | 0.39 | FAL: 2884, TRU: 1876 |
latest_record | 0 | 1 | 1.00 | TRU: 4760 |
status | 0 | 1 | 0.56 | TRU: 2642, FAL: 2118 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
row_id | 0 | 1.00 | 68550.48 | 10216.94 | 49601.00 | 66874.75 | 68244.50 | 69562.25 | 471319.00 | ▇▁▁▁▁ |
lat_deg | 0 | 1.00 | 7.68 | 0.22 | 7.06 | 7.51 | 7.71 | 7.88 | 8.06 | ▁▂▇▇▇ |
lon_deg | 0 | 1.00 | 4.54 | 0.21 | 4.08 | 4.36 | 4.56 | 4.71 | 5.06 | ▃▆▇▇▂ |
install_year | 1144 | 0.76 | 2008.63 | 6.04 | 1917.00 | 2006.00 | 2010.00 | 2013.00 | 2015.00 | ▁▁▁▁▇ |
fecal_coliform_value | 4760 | 0.00 | NaN | NA | NA | NA | NA | NA | NA | |
distance_to_primary_road | 0 | 1.00 | 5021.53 | 5648.34 | 0.01 | 719.36 | 2972.78 | 7314.73 | 26909.86 | ▇▂▁▁▁ |
distance_to_secondary_road | 0 | 1.00 | 3750.47 | 3938.63 | 0.15 | 460.90 | 2554.25 | 5791.94 | 19559.48 | ▇▃▁▁▁ |
distance_to_tertiary_road | 0 | 1.00 | 1259.28 | 1680.04 | 0.02 | 121.25 | 521.77 | 1834.42 | 10966.27 | ▇▂▁▁▁ |
distance_to_city | 0 | 1.00 | 16663.99 | 10960.82 | 53.05 | 7930.75 | 15030.41 | 24255.75 | 47934.34 | ▇▇▆▃▁ |
distance_to_town | 0 | 1.00 | 16726.59 | 12452.65 | 30.00 | 6876.92 | 12204.53 | 27739.46 | 44020.64 | ▇▅▃▃▂ |
rehab_priority | 2654 | 0.44 | 489.33 | 1658.81 | 0.00 | 7.00 | 91.50 | 376.25 | 29697.00 | ▇▁▁▁▁ |
water_point_population | 4 | 1.00 | 513.58 | 1458.92 | 0.00 | 14.00 | 119.00 | 433.25 | 29697.00 | ▇▁▁▁▁ |
local_population_1km | 4 | 1.00 | 2727.16 | 4189.46 | 0.00 | 176.00 | 1032.00 | 3717.00 | 36118.00 | ▇▁▁▁▁ |
crucialness_score | 798 | 0.83 | 0.26 | 0.28 | 0.00 | 0.07 | 0.15 | 0.35 | 1.00 | ▇▃▁▁▁ |
pressure_score | 798 | 0.83 | 1.46 | 4.16 | 0.00 | 0.12 | 0.41 | 1.24 | 93.69 | ▇▁▁▁▁ |
usage_capacity | 0 | 1.00 | 560.74 | 338.46 | 300.00 | 300.00 | 300.00 | 1000.00 | 1000.00 | ▇▁▁▁▅ |
days_since_report | 0 | 1.00 | 2692.69 | 41.92 | 1483.00 | 2688.00 | 2693.00 | 2700.00 | 4645.00 | ▁▇▁▁▁ |
staleness_score | 0 | 1.00 | 42.80 | 0.58 | 23.13 | 42.70 | 42.79 | 42.86 | 62.66 | ▁▁▇▁▁ |
location_id | 0 | 1.00 | 235865.49 | 6657.60 | 23741.00 | 230638.75 | 236199.50 | 240061.25 | 267454.00 | ▁▁▁▁▇ |
cluster_size | 0 | 1.00 | 1.05 | 0.25 | 1.00 | 1.00 | 1.00 | 1.00 | 4.00 | ▇▁▁▁▁ |
lat_deg_original | 4760 | 0.00 | NaN | NA | NA | NA | NA | NA | NA | |
lon_deg_original | 4760 | 0.00 | NaN | NA | NA | NA | NA | NA | NA | |
count | 0 | 1.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
There are 1144 missing value in install_year
column so we won’t use it.
<-osun_wp_sf %>%
osun_wp_sf_cleanfilter_at(vars(status,
distance_to_primary_road,
distance_to_secondary_road,
distance_to_tertiary_road,
distance_to_city,
distance_to_town,
water_point_population,
local_population_1km,
usage_capacity,
is_urban,
water_source_clean),all_vars(!is.na(.))) %>%
mutate(usage_capacity=as.factor(usage_capacity))
%>%
osun_wp_sf_clean freq(input='status')
status frequency percentage cumulative_perc
1 TRUE 2642 55.55 55.55
2 FALSE 2114 44.45 100.00
%>%
osun_wp_sf_clean skim()
Warning: Couldn't find skimmers for class: sfc_POINT, sfc; No user-defined `sfl`
provided. Falling back to `character`.
Name | Piped data |
Number of rows | 4756 |
Number of columns | 75 |
_______________________ | |
Column type frequency: | |
character | 47 |
factor | 1 |
logical | 5 |
numeric | 22 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
source | 0 | 1.00 | 5 | 44 | 0 | 2 | 0 |
report_date | 0 | 1.00 | 22 | 22 | 0 | 42 | 0 |
status_id | 0 | 1.00 | 2 | 7 | 0 | 3 | 0 |
water_source_clean | 0 | 1.00 | 8 | 22 | 0 | 3 | 0 |
water_source_category | 0 | 1.00 | 4 | 6 | 0 | 2 | 0 |
water_tech_clean | 23 | 1.00 | 9 | 23 | 0 | 3 | 0 |
water_tech_category | 23 | 1.00 | 9 | 15 | 0 | 2 | 0 |
facility_type | 0 | 1.00 | 8 | 8 | 0 | 1 | 0 |
clean_country_name | 0 | 1.00 | 7 | 7 | 0 | 1 | 0 |
clean_adm1 | 0 | 1.00 | 3 | 5 | 0 | 5 | 0 |
clean_adm2 | 0 | 1.00 | 3 | 14 | 0 | 35 | 0 |
clean_adm3 | 4756 | 0.00 | NA | NA | 0 | 0 | 0 |
clean_adm4 | 4756 | 0.00 | NA | NA | 0 | 0 | 0 |
installer | 4756 | 0.00 | NA | NA | 0 | 0 | 0 |
management_clean | 1569 | 0.67 | 5 | 37 | 0 | 7 | 0 |
status_clean | 0 | 1.00 | 9 | 32 | 0 | 6 | 0 |
pay | 0 | 1.00 | 2 | 39 | 0 | 7 | 0 |
fecal_coliform_presence | 4756 | 0.00 | NA | NA | 0 | 0 | 0 |
subjective_quality | 0 | 1.00 | 18 | 20 | 0 | 4 | 0 |
activity_id | 4753 | 0.00 | 36 | 36 | 0 | 3 | 0 |
scheme_id | 4756 | 0.00 | NA | NA | 0 | 0 | 0 |
wpdx_id | 0 | 1.00 | 12 | 12 | 0 | 4756 | 0 |
notes | 0 | 1.00 | 2 | 96 | 0 | 3499 | 0 |
orig_lnk | 4753 | 0.00 | 84 | 84 | 0 | 1 | 0 |
photo_lnk | 41 | 0.99 | 84 | 84 | 0 | 4715 | 0 |
country_id | 0 | 1.00 | 2 | 2 | 0 | 1 | 0 |
data_lnk | 0 | 1.00 | 79 | 96 | 0 | 2 | 0 |
water_point_history | 0 | 1.00 | 142 | 834 | 0 | 4746 | 0 |
clean_country_id | 0 | 1.00 | 3 | 3 | 0 | 1 | 0 |
country_name | 0 | 1.00 | 7 | 7 | 0 | 1 | 0 |
water_source | 0 | 1.00 | 8 | 30 | 0 | 4 | 0 |
water_tech | 0 | 1.00 | 5 | 37 | 0 | 19 | 0 |
adm2 | 0 | 1.00 | 3 | 14 | 0 | 33 | 0 |
adm3 | 4756 | 0.00 | NA | NA | 0 | 0 | 0 |
management | 1569 | 0.67 | 5 | 47 | 0 | 7 | 0 |
adm1 | 0 | 1.00 | 4 | 5 | 0 | 4 | 0 |
New Georeferenced Column | 0 | 1.00 | 16 | 35 | 0 | 4756 | 0 |
lat_lon_deg | 0 | 1.00 | 13 | 32 | 0 | 4756 | 0 |
public_data_source | 0 | 1.00 | 84 | 102 | 0 | 2 | 0 |
converted | 0 | 1.00 | 53 | 53 | 0 | 1 | 0 |
created_timestamp | 0 | 1.00 | 22 | 22 | 0 | 2 | 0 |
updated_timestamp | 0 | 1.00 | 22 | 22 | 0 | 2 | 0 |
Geometry | 0 | 1.00 | 33 | 37 | 0 | 4756 | 0 |
ADM2_EN | 0 | 1.00 | 3 | 14 | 0 | 30 | 0 |
ADM2_PCODE | 0 | 1.00 | 8 | 8 | 0 | 30 | 0 |
ADM1_EN | 0 | 1.00 | 4 | 4 | 0 | 1 | 0 |
ADM1_PCODE | 0 | 1.00 | 5 | 5 | 0 | 1 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
usage_capacity | 0 | 1 | FALSE | 2 | 300: 2986, 100: 1770 |
Variable type: logical
skim_variable | n_missing | complete_rate | mean | count |
---|---|---|---|---|
rehab_year | 4756 | 0 | NaN | : |
rehabilitator | 4756 | 0 | NaN | : |
is_urban | 0 | 1 | 0.39 | FAL: 2882, TRU: 1874 |
latest_record | 0 | 1 | 1.00 | TRU: 4756 |
status | 0 | 1 | 0.56 | TRU: 2642, FAL: 2114 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
row_id | 0 | 1.00 | 68551.03 | 10221.05 | 49601.00 | 66875.75 | 68244.50 | 69562.25 | 471319.00 | ▇▁▁▁▁ |
lat_deg | 0 | 1.00 | 7.68 | 0.22 | 7.06 | 7.51 | 7.71 | 7.88 | 8.06 | ▁▂▇▇▇ |
lon_deg | 0 | 1.00 | 4.54 | 0.21 | 4.08 | 4.36 | 4.56 | 4.71 | 5.06 | ▃▆▇▇▂ |
install_year | 1143 | 0.76 | 2008.63 | 6.04 | 1917.00 | 2006.00 | 2010.00 | 2013.00 | 2015.00 | ▁▁▁▁▇ |
fecal_coliform_value | 4756 | 0.00 | NaN | NA | NA | NA | NA | NA | NA | |
distance_to_primary_road | 0 | 1.00 | 5021.73 | 5650.02 | 0.01 | 719.36 | 2968.38 | 7314.73 | 26909.86 | ▇▂▁▁▁ |
distance_to_secondary_road | 0 | 1.00 | 3751.00 | 3939.74 | 0.15 | 460.50 | 2554.25 | 5791.94 | 19559.48 | ▇▃▁▁▁ |
distance_to_tertiary_road | 0 | 1.00 | 1259.65 | 1680.52 | 0.02 | 121.33 | 521.77 | 1834.42 | 10966.27 | ▇▂▁▁▁ |
distance_to_city | 0 | 1.00 | 16662.78 | 10961.08 | 53.05 | 7930.75 | 15020.40 | 24255.75 | 47934.34 | ▇▇▆▃▁ |
distance_to_town | 0 | 1.00 | 16732.33 | 12455.76 | 30.00 | 6876.92 | 12215.09 | 27745.52 | 44020.64 | ▇▅▃▃▂ |
rehab_priority | 2650 | 0.44 | 489.33 | 1658.81 | 0.00 | 7.00 | 91.50 | 376.25 | 29697.00 | ▇▁▁▁▁ |
water_point_population | 0 | 1.00 | 513.58 | 1458.92 | 0.00 | 14.00 | 119.00 | 433.25 | 29697.00 | ▇▁▁▁▁ |
local_population_1km | 0 | 1.00 | 2727.16 | 4189.46 | 0.00 | 176.00 | 1032.00 | 3717.00 | 36118.00 | ▇▁▁▁▁ |
crucialness_score | 794 | 0.83 | 0.26 | 0.28 | 0.00 | 0.07 | 0.15 | 0.35 | 1.00 | ▇▃▁▁▁ |
pressure_score | 794 | 0.83 | 1.46 | 4.16 | 0.00 | 0.12 | 0.41 | 1.24 | 93.69 | ▇▁▁▁▁ |
days_since_report | 0 | 1.00 | 2692.69 | 41.94 | 1483.00 | 2688.00 | 2693.00 | 2700.00 | 4645.00 | ▁▇▁▁▁ |
staleness_score | 0 | 1.00 | 42.80 | 0.58 | 23.13 | 42.70 | 42.79 | 42.86 | 62.66 | ▁▁▇▁▁ |
location_id | 0 | 1.00 | 235864.87 | 6659.44 | 23741.00 | 230638.75 | 236198.50 | 240062.25 | 267454.00 | ▁▁▁▁▇ |
cluster_size | 0 | 1.00 | 1.05 | 0.25 | 1.00 | 1.00 | 1.00 | 1.00 | 4.00 | ▇▁▁▁▁ |
lat_deg_original | 4756 | 0.00 | NaN | NA | NA | NA | NA | NA | NA | |
lon_deg_original | 4756 | 0.00 | NaN | NA | NA | NA | NA | NA | NA | |
count | 0 | 1.00 | 1.00 | 0.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | ▁▁▇▁▁ |
Correlation Analysis
<- osun_wp_sf_clean %>%
osun_wp select(c(7,35:39,42:43,46:47,57))%>%
st_set_geometry(NULL)#drop away the geometry column
=cor(
cluster_vars.cor2:7])
osun_wp[,corrplot.mixed(cluster_vars.cor,
lower="ellipse",
upper="number",
tl.pos = "lt",
diag="l",
tl.col="black")
Building a logistic regression model
<-glm(status ~ distance_to_primary_road+
model+
distance_to_secondary_road+
distance_to_tertiary_road+
distance_to_city+
distance_to_town+
water_point_population+
local_population_1km+
usage_capacity+
is_urban
water_source_clean,data=osun_wp_sf_clean,
family=binomial(link='logit'))
blr_regress(model)
Model Overview
------------------------------------------------------------------------
Data Set Resp Var Obs. Df. Model Df. Residual Convergence
------------------------------------------------------------------------
data status 4756 4755 4744 TRUE
------------------------------------------------------------------------
Response Summary
--------------------------------------------------------
Outcome Frequency Outcome Frequency
--------------------------------------------------------
0 2114 1 2642
--------------------------------------------------------
Maximum Likelihood Estimates
-----------------------------------------------------------------------------------------------
Parameter DF Estimate Std. Error z value Pr(>|z|)
-----------------------------------------------------------------------------------------------
(Intercept) 1 0.3887 0.1124 3.4588 5e-04
distance_to_primary_road 1 0.0000 0.0000 -0.7153 0.4744
distance_to_secondary_road 1 0.0000 0.0000 -0.5530 0.5802
distance_to_tertiary_road 1 1e-04 0.0000 4.6708 0.0000
distance_to_city 1 0.0000 0.0000 -4.7574 0.0000
distance_to_town 1 0.0000 0.0000 -4.9170 0.0000
water_point_population 1 -5e-04 0.0000 -11.3686 0.0000
local_population_1km 1 3e-04 0.0000 19.2953 0.0000
usage_capacity1000 1 -0.6230 0.0697 -8.9366 0.0000
is_urbanTRUE 1 -0.2971 0.0819 -3.6294 3e-04
water_source_cleanProtected Shallow Well 1 0.5040 0.0857 5.8783 0.0000
water_source_cleanProtected Spring 1 1.2882 0.4388 2.9359 0.0033
-----------------------------------------------------------------------------------------------
Association of Predicted Probabilities and Observed Responses
---------------------------------------------------------------
% Concordant 0.7347 Somers' D 0.4693
% Discordant 0.2653 Gamma 0.4693
% Tied 0.0000 Tau-a 0.2318
Pairs 5585188 c 0.7347
---------------------------------------------------------------
distance_to_tertiary_road, distance_to_city, distance
#report(model)
blr_confusion_matrix(model,cutoff=0.5)
Confusion Matrix and Statistics
Reference
Prediction FALSE TRUE
0 1301 738
1 813 1904
Accuracy : 0.6739
No Information Rate : 0.4445
Kappa : 0.3373
McNemars's Test P-Value : 0.0602
Sensitivity : 0.7207
Specificity : 0.6154
Pos Pred Value : 0.7008
Neg Pred Value : 0.6381
Prevalence : 0.5555
Detection Rate : 0.4003
Detection Prevalence : 0.5713
Balanced Accuracy : 0.6680
Precision : 0.7008
Recall : 0.7207
'Positive' Class : 1
Building a Geographically weight logistic regression (GWLR) model
Converting from sf to sp data frame
<-osun_wp_sf_clean %>%
osun_wp_spselect(c(status,
distance_to_primary_road,
distance_to_secondary_road,
distance_to_tertiary_road,
distance_to_city,
distance_to_town,
water_point_population,
local_population_1km,
usage_capacity,
is_urban,%>%
water_source_clean)) as_Spatial()
osun_wp_sp
class : SpatialPointsDataFrame
features : 4756
extent : 182502.4, 290751, 340054.1, 450905.3 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 11
names : status, distance_to_primary_road, distance_to_secondary_road, distance_to_tertiary_road, distance_to_city, distance_to_town, water_point_population, local_population_1km, usage_capacity, is_urban, water_source_clean
min values : 0, 0.014461356813335, 0.152195902540837, 0.017815121653488, 53.0461399623541, 30.0019777713073, 0, 0, 1000, 0, Borehole
max values : 1, 26909.8616132094, 19559.4793799085, 10966.2705628969, 47934.343603562, 44020.6393368124, 29697, 36118, 300, 1, Protected Spring
Building Fixed Bandwidth GWR Model
Computing fixed bandwidth
<- bw.ggwr(status~distance_to_primary_road+
bw.fixed+
distance_to_secondary_road+
distance_to_tertiary_road+
distance_to_city+
distance_to_town+
water_point_population+
local_population_1km+
usage_capacity+
is_urban
water_source_clean,data=osun_wp_sp,
family="binomial",
approach = "AIC",
kernel = "gaussian",
adaptive = FALSE,
longlat = FALSE)
bw.fixed
<-ggwr.basic(status~distance_to_primary_road+
gwlr.fixed+
distance_to_secondary_road+
distance_to_tertiary_road+
distance_to_city+
distance_to_town+
water_point_population+
local_population_1km+
usage_capacity+
is_urban
water_source_clean,data=osun_wp_sp,
bw=2597.255,
family="binomial",
kernel="gaussian",
adaptive=FALSE,
longlat=FALSE)
Iteration Log-Likelihood
=========================
0 -1957
1 -1675
2 -1525
3 -1441
4 -1403
5 -1403
gwlr.fixed
***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2022-12-17 17:38:35
Call:
ggwr.basic(formula = status ~ distance_to_primary_road + distance_to_secondary_road +
distance_to_tertiary_road + distance_to_city + distance_to_town +
water_point_population + local_population_1km + usage_capacity +
is_urban + water_source_clean, data = osun_wp_sp, bw = 2597.255,
family = "binomial", kernel = "gaussian", adaptive = FALSE,
longlat = FALSE)
Dependent (y) variable: status
Independent variables: distance_to_primary_road distance_to_secondary_road distance_to_tertiary_road distance_to_city distance_to_town water_point_population local_population_1km usage_capacity is_urban water_source_clean
Number of data points: 4756
Used family: binomial
***********************************************************************
* Results of Generalized linear Regression *
***********************************************************************
Call:
NULL
Deviance Residuals:
Min 1Q Median 3Q Max
-124.555 -1.755 1.072 1.742 34.333
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Intercept 3.887e-01 1.124e-01 3.459 0.000543
distance_to_primary_road -4.642e-06 6.490e-06 -0.715 0.474422
distance_to_secondary_road -5.143e-06 9.299e-06 -0.553 0.580230
distance_to_tertiary_road 9.683e-05 2.073e-05 4.671 3.00e-06
distance_to_city -1.686e-05 3.544e-06 -4.757 1.96e-06
distance_to_town -1.480e-05 3.009e-06 -4.917 8.79e-07
water_point_population -5.097e-04 4.484e-05 -11.369 < 2e-16
local_population_1km 3.451e-04 1.788e-05 19.295 < 2e-16
usage_capacity1000 -6.230e-01 6.972e-02 -8.937 < 2e-16
is_urbanTRUE -2.971e-01 8.185e-02 -3.629 0.000284
water_source_cleanProtected Shallow Well 5.040e-01 8.574e-02 5.878 4.14e-09
water_source_cleanProtected Spring 1.288e+00 4.388e-01 2.936 0.003325
Intercept ***
distance_to_primary_road
distance_to_secondary_road
distance_to_tertiary_road ***
distance_to_city ***
distance_to_town ***
water_point_population ***
local_population_1km ***
usage_capacity1000 ***
is_urbanTRUE ***
water_source_cleanProtected Shallow Well ***
water_source_cleanProtected Spring **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6534.5 on 4755 degrees of freedom
Residual deviance: 5688.0 on 4744 degrees of freedom
AIC: 5712
Number of Fisher Scoring iterations: 5
AICc: 5712.099
Pseudo R-square value: 0.1295351
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: gaussian
Fixed bandwidth: 2597.255
Regression points: the same locations as observations are used.
Distance metric: A distance matrix is specified for this model calibration.
************Summary of Generalized GWR coefficient estimates:**********
Min. 1st Qu. Median
Intercept -8.9630e+02 -4.9805e+00 1.7599e+00
distance_to_primary_road -1.9477e-02 -4.8092e-04 3.0174e-05
distance_to_secondary_road -1.5757e-02 -3.7583e-04 1.2438e-04
distance_to_tertiary_road -1.5673e-02 -4.2538e-04 7.6217e-05
distance_to_city -1.8447e-02 -5.6287e-04 -1.2745e-04
distance_to_town -2.2450e-02 -5.7335e-04 -1.5218e-04
water_point_population -5.2830e-02 -2.2810e-03 -9.8829e-04
local_population_1km -1.2757e-01 5.0016e-04 1.0647e-03
usage_capacity1000 -2.0846e+01 -9.7311e-01 -4.1596e-01
is_urbanTRUE -1.9866e+02 -4.3054e+00 -1.6908e+00
water_source_cleanProtected.Shallow.Well -2.0782e+01 -4.5536e-01 5.3278e-01
water_source_cleanProtected.Spring -5.2495e+02 -5.5983e+00 2.5500e+00
3rd Qu. Max.
Intercept 1.2829e+01 1075.4234
distance_to_primary_road 4.8497e-04 0.0143
distance_to_secondary_road 6.0665e-04 0.0259
distance_to_tertiary_road 6.7104e-04 0.0129
distance_to_city 2.3763e-04 0.0155
distance_to_town 1.9318e-04 0.0225
water_point_population 5.0564e-04 0.1313
local_population_1km 1.8177e-03 0.0392
usage_capacity1000 3.0334e-01 5.9492
is_urbanTRUE 1.2864e+00 746.9498
water_source_cleanProtected.Shallow.Well 1.7870e+00 67.5549
water_source_cleanProtected.Spring 6.7736e+00 331.1243
************************Diagnostic information*************************
Number of data points: 4756
GW Deviance: 2792.323
AIC : 4413.603
AICc : 4747.217
Pseudo R-square value: 0.5726785
***********************************************************************
Program stops at: 2022-12-17 17:39:12
Model Assessment
Converting SDF into sf data frame
<-as.data.frame(gwlr.fixed$SDF) gwr.fixed
<-gwr.fixed %>%
gwr.fixedmutate(most=ifelse(gwr.fixed$yhat>=0.5,T,F))
$y<- as.factor(gwr.fixed$y)
gwr.fixed$most<- as.factor(gwr.fixed$most)
gwr.fixed<-confusionMatrix(data=gwr.fixed$most,reference=gwr.fixed$y)
CM CM
Confusion Matrix and Statistics
Reference
Prediction FALSE TRUE
FALSE 1824 263
TRUE 290 2379
Accuracy : 0.8837
95% CI : (0.8743, 0.8927)
No Information Rate : 0.5555
P-Value [Acc > NIR] : <2e-16
Kappa : 0.7642
Mcnemar's Test P-Value : 0.2689
Sensitivity : 0.8628
Specificity : 0.9005
Pos Pred Value : 0.8740
Neg Pred Value : 0.8913
Prevalence : 0.4445
Detection Rate : 0.3835
Detection Prevalence : 0.4388
Balanced Accuracy : 0.8816
'Positive' Class : FALSE
Visualizing gwLR
<-osun_wp_sf_clean%>%
osun_wp_sf_selectedselect(c(ADM2_EN,ADM2_PCODE,ADM1_EN,ADM1_PCODE,status
))
<-cbind(osun_wp_sf_selected,gwr.fixed) gwr_sf.fixed
tmap_mode("view")
tmap mode set to interactive viewing
<-tm_shape(osun)+
prob_Ttm_polygons(alpha=0.1)+
tm_shape(gwr_sf.fixed)+
tm_dots(col="yhat",
border.col="gray60",
border.lwd=1)+
tm_view(set.zoom.limits = c(8,14))
prob_T
Re-calibrate the model without insignificant variables
From the above logistic regression model result, the p-value of distance_to_primary_road
and distance_to_secondary_road
are 0.4744 and 0.5802. Both of them and higher than 0.05, which means those variables are not significant. Now we re-calibrate the logistic regression model and GWLR model by excluding the two independent variables that are not statistically significant during the initial round of model calibration.
Building the new logistic regression model
<-glm(status ~
model_new+
distance_to_tertiary_road+
distance_to_city+
distance_to_town+
water_point_population+
local_population_1km+
usage_capacity+
is_urban
water_source_clean,data=osun_wp_sf_clean,
family=binomial(link='logit'))
blr_regress(model_new)
Model Overview
------------------------------------------------------------------------
Data Set Resp Var Obs. Df. Model Df. Residual Convergence
------------------------------------------------------------------------
data status 4756 4755 4746 TRUE
------------------------------------------------------------------------
Response Summary
--------------------------------------------------------
Outcome Frequency Outcome Frequency
--------------------------------------------------------
0 2114 1 2642
--------------------------------------------------------
Maximum Likelihood Estimates
-----------------------------------------------------------------------------------------------
Parameter DF Estimate Std. Error z value Pr(>|z|)
-----------------------------------------------------------------------------------------------
(Intercept) 1 0.3540 0.1055 3.3541 8e-04
distance_to_tertiary_road 1 1e-04 0.0000 4.9096 0.0000
distance_to_city 1 0.0000 0.0000 -5.2022 0.0000
distance_to_town 1 0.0000 0.0000 -5.4660 0.0000
water_point_population 1 -5e-04 0.0000 -11.3902 0.0000
local_population_1km 1 3e-04 0.0000 19.4069 0.0000
usage_capacity1000 1 -0.6206 0.0697 -8.9081 0.0000
is_urbanTRUE 1 -0.2667 0.0747 -3.5690 4e-04
water_source_cleanProtected Shallow Well 1 0.4947 0.0850 5.8228 0.0000
water_source_cleanProtected Spring 1 1.2790 0.4384 2.9174 0.0035
-----------------------------------------------------------------------------------------------
Association of Predicted Probabilities and Observed Responses
---------------------------------------------------------------
% Concordant 0.7349 Somers' D 0.4697
% Discordant 0.2651 Gamma 0.4697
% Tied 0.0000 Tau-a 0.2320
Pairs 5585188 c 0.7349
---------------------------------------------------------------
blr_confusion_matrix(model_new,cutoff=0.5)
Confusion Matrix and Statistics
Reference
Prediction FALSE TRUE
0 1300 743
1 814 1899
Accuracy : 0.6726
No Information Rate : 0.4445
Kappa : 0.3348
McNemars's Test P-Value : 0.0761
Sensitivity : 0.7188
Specificity : 0.6149
Pos Pred Value : 0.7000
Neg Pred Value : 0.6363
Prevalence : 0.5555
Detection Rate : 0.3993
Detection Prevalence : 0.5704
Balanced Accuracy : 0.6669
Precision : 0.7000
Recall : 0.7188
'Positive' Class : 1
Compare with the previous gwLR model, the accuracy decrease from 0.6739 to 0.6726.
Building a Geographically weight logistic regression (GWLR) model
Converting from sf to sp data frame
<-osun_wp_sf_clean %>%
osun_wp_sp_newselect(c(status,
distance_to_tertiary_road,
distance_to_city,
distance_to_town,
water_point_population,
local_population_1km,
usage_capacity,
is_urban,%>%
water_source_clean)) as_Spatial()
osun_wp_sp_new
class : SpatialPointsDataFrame
features : 4756
extent : 182502.4, 290751, 340054.1, 450905.3 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=4 +lon_0=8.5 +k=0.99975 +x_0=670553.98 +y_0=0 +a=6378249.145 +rf=293.465 +towgs84=-92,-93,122,0,0,0,0 +units=m +no_defs
variables : 9
names : status, distance_to_tertiary_road, distance_to_city, distance_to_town, water_point_population, local_population_1km, usage_capacity, is_urban, water_source_clean
min values : 0, 0.017815121653488, 53.0461399623541, 30.0019777713073, 0, 0, 1000, 0, Borehole
max values : 1, 10966.2705628969, 47934.343603562, 44020.6393368124, 29697, 36118, 300, 1, Protected Spring
Building Fixed Bandwidth GWR Model
Computing fixed bandwidth
<- bw.ggwr(status~
bw.fixed_new+
distance_to_tertiary_road+
distance_to_city+
distance_to_town+
water_point_population+
local_population_1km+
usage_capacity+
is_urban
water_source_clean,data=osun_wp_sp_new,
family="binomial",
approach = "AIC",
kernel = "gaussian",
adaptive = FALSE,
longlat = FALSE)
bw.fixed_new
<-ggwr.basic(status~
gwlr.fixed_new+
distance_to_tertiary_road+
distance_to_city+
distance_to_town+
water_point_population+
local_population_1km+
usage_capacity+
is_urban
water_source_clean,data=osun_wp_sp_new,
bw=2597.255,
family="binomial",
kernel="gaussian",
adaptive=FALSE,
longlat=FALSE)
Iteration Log-Likelihood
=========================
0 -2034
1 -1772
2 -1635
3 -1561
4 -1530
5 -1530
gwlr.fixed_new
***********************************************************************
* Package GWmodel *
***********************************************************************
Program starts at: 2022-12-17 17:39:14
Call:
ggwr.basic(formula = status ~ distance_to_tertiary_road + distance_to_city +
distance_to_town + water_point_population + local_population_1km +
usage_capacity + is_urban + water_source_clean, data = osun_wp_sp_new,
bw = 2597.255, family = "binomial", kernel = "gaussian",
adaptive = FALSE, longlat = FALSE)
Dependent (y) variable: status
Independent variables: distance_to_tertiary_road distance_to_city distance_to_town water_point_population local_population_1km usage_capacity is_urban water_source_clean
Number of data points: 4756
Used family: binomial
***********************************************************************
* Results of Generalized linear Regression *
***********************************************************************
Call:
NULL
Deviance Residuals:
Min 1Q Median 3Q Max
-129.368 -1.750 1.074 1.742 34.126
Coefficients:
Estimate Std. Error z value Pr(>|z|)
Intercept 3.540e-01 1.055e-01 3.354 0.000796
distance_to_tertiary_road 1.001e-04 2.040e-05 4.910 9.13e-07
distance_to_city -1.764e-05 3.391e-06 -5.202 1.97e-07
distance_to_town -1.544e-05 2.825e-06 -5.466 4.60e-08
water_point_population -5.098e-04 4.476e-05 -11.390 < 2e-16
local_population_1km 3.452e-04 1.779e-05 19.407 < 2e-16
usage_capacity1000 -6.206e-01 6.966e-02 -8.908 < 2e-16
is_urbanTRUE -2.667e-01 7.474e-02 -3.569 0.000358
water_source_cleanProtected Shallow Well 4.947e-01 8.496e-02 5.823 5.79e-09
water_source_cleanProtected Spring 1.279e+00 4.384e-01 2.917 0.003530
Intercept ***
distance_to_tertiary_road ***
distance_to_city ***
distance_to_town ***
water_point_population ***
local_population_1km ***
usage_capacity1000 ***
is_urbanTRUE ***
water_source_cleanProtected Shallow Well ***
water_source_cleanProtected Spring **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 6534.5 on 4755 degrees of freedom
Residual deviance: 5688.9 on 4746 degrees of freedom
AIC: 5708.9
Number of Fisher Scoring iterations: 5
AICc: 5708.923
Pseudo R-square value: 0.129406
***********************************************************************
* Results of Geographically Weighted Regression *
***********************************************************************
*********************Model calibration information*********************
Kernel function: gaussian
Fixed bandwidth: 2597.255
Regression points: the same locations as observations are used.
Distance metric: A distance matrix is specified for this model calibration.
************Summary of Generalized GWR coefficient estimates:**********
Min. 1st Qu. Median
Intercept -2.7771e+02 -3.9915e+00 2.9346e+00
distance_to_tertiary_road -2.0066e-02 -3.6016e-04 8.9385e-05
distance_to_city -3.0931e-02 -5.6273e-04 -1.0359e-04
distance_to_town -3.4702e-03 -4.3133e-04 -1.2398e-04
water_point_population -3.5450e-02 -2.0856e-03 -1.1271e-03
local_population_1km -5.8060e-02 4.0342e-04 1.0001e-03
usage_capacity1000 -4.5295e+01 -1.0249e+00 -3.8880e-01
is_urbanTRUE -3.0233e+02 -3.1725e+00 -1.4861e+00
water_source_cleanProtected.Shallow.Well -1.0470e+02 -4.2423e-01 5.9626e-01
water_source_cleanProtected.Spring -7.9160e+02 -5.4086e+00 2.5525e+00
3rd Qu. Max.
Intercept 1.0668e+01 1102.7459
distance_to_tertiary_road 5.3918e-04 0.0140
distance_to_city 1.2672e-04 0.0129
distance_to_town 2.2159e-04 0.0161
water_point_population 1.9400e-04 0.0569
local_population_1km 1.6838e-03 0.0293
usage_capacity1000 3.5031e-01 5.9152
is_urbanTRUE 8.9541e-01 739.6369
water_source_cleanProtected.Shallow.Well 1.8040e+00 52.4657
water_source_cleanProtected.Spring 6.5117e+00 152.2614
************************Diagnostic information*************************
Number of data points: 4756
GW Deviance: 3051.369
AIC : 4499.24
AICc : 4759.621
Pseudo R-square value: 0.5330355
***********************************************************************
Program stops at: 2022-12-17 17:39:42
Model Assessment
Converting SDF into sf data frame
<-as.data.frame(gwlr.fixed_new$SDF) gwr.fixed_new
<-gwr.fixed_new %>%
gwr.fixed_newmutate(most=ifelse(gwr.fixed_new$yhat>=0.5,T,F))
$y<- as.factor(gwr.fixed_new$y)
gwr.fixed_new$most<- as.factor(gwr.fixed_new$most)
gwr.fixed_new<-confusionMatrix(data=gwr.fixed_new$most,reference=gwr.fixed_new$y)
CM CM
Confusion Matrix and Statistics
Reference
Prediction FALSE TRUE
FALSE 1792 302
TRUE 322 2340
Accuracy : 0.8688
95% CI : (0.8589, 0.8783)
No Information Rate : 0.5555
P-Value [Acc > NIR] : <2e-16
Kappa : 0.7341
Mcnemar's Test P-Value : 0.4469
Sensitivity : 0.8477
Specificity : 0.8857
Pos Pred Value : 0.8558
Neg Pred Value : 0.8790
Prevalence : 0.4445
Detection Rate : 0.3768
Detection Prevalence : 0.4403
Balanced Accuracy : 0.8667
'Positive' Class : FALSE
Compare with the previous gwLR model, the accuracy decrease from 0.8837 to 0.8668.
Visualizing gwLR
<-osun_wp_sf_clean%>%
osun_wp_sf_selected_newselect(c(ADM2_EN,ADM2_PCODE,ADM1_EN,ADM1_PCODE,status
))
<-cbind(osun_wp_sf_selected_new,gwr.fixed_new) gwr_sf.fixed_new
tmap_mode("view")
tmap mode set to interactive viewing
<-tm_shape(osun)+
prob_Ttm_polygons(alpha=0.1)+
tm_shape(gwr_sf.fixed_new)+
tm_dots(col="yhat",
border.col="gray60",
border.lwd=1)+
tm_view(set.zoom.limits = c(8,14))
prob_T